Week 09: Register Allocation & Garbage Collection
Register Allocation
Register Allocation
Introduction
- Intermediate code uses unlimited temporaries
- Typical intermediate code uses too many temporaries
- Problem
- Rewrite the intermediate code to use no more temporaries than there are machine registers.
- Solution
- Assign multiple temporaries to each register
- But without changing the program behavior
- Example
- Source:
a = c + d; e = a + b; f = e - 1;
- Target:
r1 = r2 + r3; r1 = r1 + r4; r1 = r1 - 1;
- Theorem
- Temporaries t1 and t2 can share the same register if at any point in the program at most one of t1 or t2 is live.
- If t1 and t2 are live at the same time, they cannot share a register.
- Compute live variables for each point
- Algo
- Construct an undirected graph.
- A node for each temporary.
- An edge between t1 and t2 if they are live simultaneously at some point in the program.
- This is the register interference graph(RIG)
- Two temporaries can be allocated to the same register if there is no edge connecting them
- Sample
b
and c
cannot be in the same register
b
and d
could be in the same register
Graph Coloring
- Coloring
- A coloring of a graph is an assignment of colors to nodes, such that nodes connected by an edge have different colors.
- A graph is k-colorable if it has a coloring with k colors.
- NP-Hard
- RIG is a coloring problem.
- Heuristic
- Decide coloring orders
- Pick a node t with fewer than k neighbors
- Put t on a stack and remove it from the RIG
- Repeat until the graph is empty
- Assign the color
- Start with the last node added
- At each step pick a color different from those assigned to already colored neighbors
Spilling
Introduction
- If the graph coloring heuristic fails to find a coloring, then we can’t hold all values in registers.
- Some values are spilled to memory.
- What if all nodes have k or more neighbors?
- Pick a node as a candidate for spilling
- A spilled value lives in memory
- Assume f is chosen. Remove f and continue the simplification.
- If optimistic coloring fails, we spill f.
- Allocate a memory location for f like
$fa
.
- Before each operation that reads f, insert
f = load fa
.
- After each operation that writes f, insert
store f, fa
.
- Note f has been split into three temporaries.
fi
is live only
- Between a
fi = load fa
and the next instruction
- Between a
store fi, fa
and the preceding instr.
- Spilling reduces the live range of
f
- Additional spills might be required before a coloring is found
- The tricky part is deciding what to spill
- Possible heuristics
- Spill temporaries with most conflicts
- Spill temporaries with few definitions and uses
- Avoid spilling in inner loops
Remarks
- Register allocation is a must have in compilers
- Because intermediate code uses too many temporaries
- Because it makes a big difference in performance
- Register allocation is more complicated for CISC machines.
Managing Caches
Introduction
- Compilers are very good at managing registers.
- Compilers are not good at managing caches.
- This problem is still left to programmers.
- Example
- Before:
for(j = 0 -> 9) for(i = 1 -> 10000) a[i] *= b[i];
- After:
for(i = 1 -> 10000) for(j = 0 -> 9) a[i] *= b[i];
- Might be more than 10x faster.
- A compiler can perform this loop interchange optimization.
Garbage Collection
Automatic Memory Management
Introduction
- C and C++ programs have many storage bugs
- forgetting to free unused memory
- dereferencing a dangling pointer
- overwriting parts of a data structure by accident
- When an object is created, unused space is automatically allocated
- Some space is occupied by objects that will never be used again
- This space can be freed to be reused later
- An object x is reachable if and only if:
- a register contains a pointer to x, or
- another reachable object y contains a pointer to x
- An unreachable object can never be used
Managing Memory in Cool
- Uses an accumulator
- it points to an object
- and this object may point to other objects, etc.
- And a stack pointer
- each stack frame contains pointers, e.g., method parameters
- each stack frame also contains non-pointers, e.g., return address
- Start tracing from acc and stack (roots)
- Every garbage collection scheme has the following steps
- Allocate space as needed for new objects
- When space runs out:
- Compute what objects might be used again (generally by tracing objects reachable from a set of root registers)
- Free the space used by objects not found in (1)
- Some strategies perform garbage collection before the space actually runs out.
Mark and Sweep
Introduction
- Algo
- The Mark Phase: trace reachable objects
- The Sweep Phase: collect garbage objects
- Every object has an extra bit: the mark bit
- Initial: 0
- Set to 1 for the reachable objects in the mark phase
- Any garbage object is added to the free list
- Objects with a mark bit 1 reset to 0
- A serious problem with the mark phase
- it is invoked when we are out of space
- yet it needs space to construct the queue in the mark phase
- the size of the todo list is unbounded, so we cannot reserve space for it a priori
- The todo list is used as an auxiliary data structure to perform the reachability analysis
- There is a trick that allows the auxiliary data to be stored in the objects themselves
- Space for a new object is allocated from the new list
- a block large enough is picked
- an area of the necessary size is allocated from it
- the left-over is put back in the free list
- Mark and sweep can fragment the memory
Stop and Copy
Introduction
- Memory is organized into two areas
- old space: used for allocation
- new space: used as a reserve for GC
- GC starts when the old space is full
- Copies all reachable objects from old space into new space
- After the copy the roles of the old and new spaces are reversed and the program resumes
- Use new space as old space; use old space as new space.
- Find all the reachable objects: mark and sweep
- Algo
- Copy the objects pointed to by roots and set forwarding pointers
- Follow the pointer in the next unscanned object (A)
- copy the pointed-to objects
- fix the pointer in A
- set forwarding pointer
- Follow the pointer in the next unscanned object
- Must be able to tell how large an object is when we scan it
Conservative Collection
Introduction
- GC relies on being able to find all reachable objects
- In C or C++ it is impossible to identify the contents of objects in memory
- But it is Ok to be conservative
- if a memory word looks like a pointer, it is considered a pointer
- it must be aligned
- it must point to a valid address in the data segment
- all such pointers are followed and we overestimate the set of reachable objects
- But we still cannot move objects because we cannot update pointers to them
Reference Counting
Introduction
- Try to collect an object when there are no more pointers to it
- Store in each object the number of pointers to that object
- this is the reference count
- Advantages
- easy to implement
- collects garbage incrementally without large pauses in the execution
- Disadvantages
- cannot collect circular structures
- manipulating reference counts at each assignment is very slow
- Automatic memory management prevents serious storage bugs
- But reduces programmer control
- layout of data in memory
- when is memory deallocated
- Pauses problematic in real-time applications
- Memory leaks possible (even likely)
- Garbage collection is very important
- There are more advanced garbage collection algorithms
- concurrent: allow the program to run while the collection is happening
- generational: do not scan long-lived objects at every collection
- real time: bound the length of pauses
- parallel: several collectors working at once